-
Notifications
You must be signed in to change notification settings - Fork 88
Configurable max_tokens/max_completion_tokens key #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
68e69bc
to
ef981fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements configurable request keys for output token limits in OpenAI API calls. Instead of hardcoding both max_tokens
and max_completion_tokens
in all requests, the system now uses the appropriate key based on endpoint type through a new environment variable configuration.
- Adds
GUIDELLM__OPENAI__MAX_OUTPUT_KEY
configuration mapping endpoint types to their respective output token keys - Updates payload generation to use the configured key instead of setting both keys
- Fixes test assertions to match the new single-key approach
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
src/guidellm/config.py | Adds new max_output_key configuration with defaults for text and chat completions |
src/guidellm/backend/openai.py | Updates payload generation to use configurable key and adds type definitions |
tests/unit/conftest.py | Removes duplicate token limit assertions and fixes mock response generation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to wait for Mark's review, but looks good to me.
Summary
Makes the
max_tokens
request key configurable through an environment variable per endpoint type. Defaults tomax_tokens
for legacycompletions
andmax_completion_tokens
forchat/completions
Details
GUIDELLM__OPENAI__MAX_OUTPUT_KEY
config option which is a dict mapping from route name -> output tokens key. Default is{"text_completions": "max_tokens", "chat_completions": "max_completion_tokens"}
Test Plan
Related Issues
Use of AI
## WRITTEN BY AI ##
)